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Abstract 

Lagrangian relaxation and approximate optimization algorithms have received much atten- 
tion in the last two decades. Typically, the running time of these methods to obtain a e 
approximate solution is proportional to 4y. Recently, Bienstock and Iyengar, following Nes- 
terov, gave an algorithm for fractional packing linear programs which runs in | iterations. The 
latter algorithm requires to solve a convex quadratic program every iteration - an optimization 
subroutine which dominates the theoretical running time. 

We give an algorithm for convex programs with strictly convex constraints which runs in time 
proportional to -. The algorithm does not require to solve any quadratic program, but uses 
gradient steps and elementary operations only. Problems which have strictly convex constraints 
include maximum entropy frequency estimation, portfolio optimization with loss risk constraints, 
and various computational problems in signal processing. 

As a side product, we also obtain a simpler version of Bienstock and Iyengar's result for 
general linear programming, with similar running time. 

We derive these algorithms using a new framework for deriving convex optimization algo- 
rithms from online game playing algorithms, which may be of independent interest. 



1 Introduction 

The design of efficient approximation algorithms for certain convex and linear programs has received 
much attention in the previous two decades. Since interior point methods and other polynomial 
time algorithm are often too slow in practice ^BieOl] , researchers have tried to design approximation 
algorithms. Shahrokhi and Matula |SM90| developed the first approximation algorithm for the 
maximum concurrent flow problem. Their result spurred a great deal of research, which generalized 
the techniques to broader classes of problems (linear programming, semi-definite programming, 
packing and covering convex programs) and improved the running time jLSM+91llKPST94llPST9Tl 
IGK941 IHK981 IFleOOl IHK951 IK 1 ,961 IAHK05b| . 

In this paper we consider approximations to more general convex programs. The convex feasi- 
bility problem we consider is of the following form (the optimization version can be reduced to this 
feasibility problem by binary search), 

fj(x) < Vj G [m] (1) 
x G S n 

Where {fj,j G [m]} is a (possibly infinite) set of convex constraints and S n = {x G M. n , Y2i x i = 
1, Xi > 0} is the unit simplex. Our algorithm work almost without change if the simplex is replaced 
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by other simple convex bodies such as the ball or hypercube. The more general version, where S n 
is replaced by an arbitrary convex set in Euclidian space, can also be handled at the expense of 
slower running time (see section ETT|l . 

We say that an algorithm gives an e-approximate solution to the above program if it returns 
x G V such that Vj G [m] . fj(x) < e, or returns proof that the program is infeasible. Hence, in 
this paper we consider an additive notion of approximation. A multiplicative e- approximation is a 
x G V such that Vj G [m] . fj(x) < A*(l+e) where A* = min^g-p max ie [ m ] fi{x). There are standard 
reductions which convert an additive approximation into a multiplicative approximation. Both of 
these reductions are orthogonal to our results and can be applied to our algorithms. The first 
is based on simple scaling, and is standard in previous work (see |PST911 IYou951 lA"HK05a ) and 
increases the running time by a factor of t»- For the special case fractional packing and covering 
problems, there is a different reduction based on binary search which increases the running time 
only by a poly-logarithmic factor BIQ4J INes04| . 

A common feature to all of the prior algorithms is that they can be viewed, sometimes implicitly, 
as Frank- Wolfe FW56 algorithms, in that they iterate by solving an optimization problems over 
S n (more generally over the underlying convex set), and take convex combinations of iterates. The 
optimization problem that is iteratively solved is of the following form. 



Vp G S m . Optimization Oracle (p) = < 



x G § n s.t YljPjfj(. x ) — if exists such x 
FAIL otherwise 



It is possible to extend the methods of PST PST91 and others to problems such as (0) (see 
|Jan06llKna04| ') and obt ain the following theorem. Henceforth oj stands for the width of the instance 



- a measure of the size of the instance numbers 
min je[m] mm,^ fi{x). 



defined as u> = maxj g y max xg s n fi(x) 



Theorem 1 (previous work). There exists an algorithm that for any e > 0, 
approximation solution to mathematical program £JJ). The algorithm makes at mo, 
to Optimization Oracle, and requires 0{m) time between successive oracle calls. 



Remark 1: Much previous work focuses on reducing the dependance of the running time on 
the width. Linear dependence on to was achieved for special cases such as packing and covering 
problems (see |You95j ) . For covering and packing problems the dependence on the width can be 
removed completely, albeit introducing another n factor into the running time jJanQfij • These 
results are orthogonal to ours, and it is possible that the ideas can be combined. 

Remark 2: In case the constraint functions are linear, Optimization Oracle can be im- 
plemented in time 0(mn). Otherwise, the oracle reduces to optimization of a convex non-linear 
function over a convex set. 



Klein and Young KY99 proved an J7(e -2 ) lower bound for Frank- Wolfe type algorithms for 
covering and packing linear programs under appropriate conditions. This bound applies to all prior 
lagrangian relaxation algorithms till the recent result of Bienstock and Iyengar jBI04j . They give 
an algorithm for solving packing and covering linear programs in time linear in |, proving 

Theorem 2 ([BI04J). There exists an algorithm that for any e > 0, returns a e- approximation 
solution to packing or covering linear programs with m constraints. The algorithm makes at most 
0{~ ) iterations. Each iteration requires solving a convex separable quadratic program. The algo- 
rithm requires 0(mn) time between successive oracle calls. 
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Their algorithm has a non-combinatorial component, viz., solving convex separable quadratic 
programs. To solve these convex programs one can use interior point methods, which have large 
polynomial running time largely dominating the entire running time of the algorithm. The BI04 
algorithm is based on previous algorithms by Nesterov jNes04j for special cases of linear and conic 
programming. Nesterov's algorithm pre-computes a quadratic program, which also dominates the 
running time of his algorithm. 

1.1 Our results 

We give a simple approximation algorithms for convex programs whose running time is linear in ^ . 
The algorithms requires only gradient computations and combinatorial operations (or a separation 
oracle more generally), and does not need to solve quadratic programs. 

The f2(e -2 ) lower bound of Klein and Young is circumvented by using the strict convexity of 
the constraints. The constraint functions are said to be strictly convex if there exists a positive 
real number H > such that minj £ j m ] min xG ^> V 2 fj(x) >z H ■ I 1 . In other words, the Hessian 
of the constraint function is positive definite (as opposed to positive semi-definite) with smallest 
eigenvalue at least H > 0. 

Our running time bounds depend on the gradients of the constraint functions as well. Let 
G = max Jg [ m ] max^g-p ||V/j(x)||2 be an upper bound on the norm of the gradients of the constraint 
functions. G is related to the width of the convex program: for linear constraints, the gradients 
are simply the coefficients of the constraints, and the width is the largest coefficient. Hence, G is 
at most y/n times the width. In section (j3J) we prove the following Theorem. 

Theorem 3 (Main 1). There exists an algorithm that for any e > 0, returns a e- approximate 
solution to mathematical program 0). The algorithm makes at most 0(^j--^) calls to Separation 
Oracle, and requires a single gradient computation and additional 0(n) time between successive 
oracle calls. 

Remark: Commonly the gradient of a given function can be computed in time which is linear in 
the function representation. Examples of functions which admit linear-time gradient computation 
include polynomials, logarithmic functions and exponentials. 

The separation oracle which our algorithm invokes is defined as 



If the constraints are given explicitly, often this oracle is easy to implement in time linear in 
the input size. Such constraints include linear functions, polynomials and logarithms. This oracle 
is also easy to implement in parallel: the constraints can be distributed amongst the available 
processors and evaluated in parallel. 

For all cases in which H is zero or too small the theorem above cannot be applied. However, 
we can apply a simple reduction to strictly convex constraints and obtain the following corollary. 

Corollary 4. For any e > 0, there exists an algorithm that returns a e-approximate solution to 
mathematical program (QJ). The algorithm makes at most 0(— 5-) calls to Separation Oracle and 
requires additional 0{n) time and a single gradient computation between successive oracle calls. 

x we denote A >; B if the matrix A — B y is positive semi-definite 



j G [to] s.t fj(x) > e if exists such fj 



Vx e S„ . Separation Oracle (x) = 



FAIL 



otherwise 
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In comparison to Theorem^ this corollary may require O(n) more iterations. However, each 
iteration requires a call to Separation Oracle, as opposed to Optimization Oracle. A Sepa- 
ration Oracle requires only function evaluation, which can many times be implemented in linear 
time in the input size, whereas an Optimization Oracle could require expensive operations such 
as matrix inversions. 

There is yet another alternative to deal with linear constraints and yet obtain linear dependence 
on e. This is given by the following theorem. The approximation algorithm runs in time linear in ^, 
and yet does not require a lower bound on H. The downside of this algorithm is the computation 
of "generalized projections". A generalized projection of a vector y £ R n onto a convex set V with 
respect to PSD matrix A y is defined to be IXp(y) = ar g mm ze-p(x — y) T A(x — y). Generalized 
projections can be cast as convex mathematical programs. If the underlying set is simple, such as 
the ball or simplex, then the program reduces to a convex quadratic program. 

Theorem 5 (Main 2). There exists an algorithm that for any e > returns a e- approximate 
solution to mathematical program (^Q). The algorithm makes at most O(^) calls to SEPARATION 
Oracle and requires computation of a generalized projection onto S n , a single gradient computation 
and additional 0(n 2 ) time between successive oracle calls. 

An example of an application of the above theorem is the following linear program. 

Vj G [m] . Aj ■ x > 0, x G S n (2) 

It is shown in DV04 that general linear programming can be reduced to this form, and that 
without loss of generality, \/j £ [m] \\Aj\\ = 1. This format is called the "perceptron" format for 
linear programs. As a corollary to Theorem 03 we obtain 

Corollary 6. There exists an algorithm that for any e > returns a e-approximate solution to 
linear program The algorithm makes 0(j) iterations. Each iteration requires 0(n(m + n)) 

computing time plus computation of a generalized projection onto the simplex. 

Theorem 03 and Corollary 03 extend the result of Bienstock and Iyengar fBT041 to general con- 
vex programming 2 . The running time of the algorithm is very similar to theirs: the number of 
iterations is the same, and each iteration also requires to solve convex quadratic programs (gener- 
alized projections onto the simplex in our case). Our algorithm is very different from |BT04] . The 
analysis is simpler, and relies on recent results from online learning. We note that the algorithm of 
Bienstock and Iyengar allows improved running time for sparse instances, whereas our algorithm 
currently does not. 

1.2 Lagrangian relaxation and solving zero sum games 

The relation between lagrangian relaxation and solving zero sum games was implicit in the original 
PST work, and explicit in the work of Freund and Schapire on online game playing FS99 (the 
general connection between zero sum games and linear programming goes back to von Neumann). 

Most previous lagrangian relaxation algorithms can be viewed as reducing the optimization 
problem at hand to a zero sum game, and then applying a certain online game playing algorithm, 
the Multiplicative Weights algorithm, to solve the game. 

Our main insight is that the Multiplicative Weights algorithm can be replaced by any online 
convex optimization (see next section for precise definition) algorithm. Recent developments in 

2 Bienstock and Iyengar's techniques can also be extended to full linear programming by introducing dependence 
on the width which is similar to that of our algorithms |Bie06| . 
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online game playing introduce algorithms with much better performance guarantees for online 
games with convex payoff functions AH05 , HKKA06 . Our results are derived by reducing convex 
optimization problems to games with payoffs which stem from convex functions, and using the new 
algorithms to solve these games. 

The online framework also provides an alternative explanation to the aforementioned Klein 
and Young Q(e~ 2 ) lower bound on the number of iterations required by Frank- Wolfe algorithms 
to produce an e-approximate solution. Translated to the online framework, previous algorithm 
were based on online algorithms with Q(y/T) regret (the standard performance measure for online 
algorithms, see next section for precise definition). Our linear dependance on ^ is the consequence 
of using of online algorithms with O(logT) regret. This is formalized in Appendix lAl 

2 The general scheme 

We outline a general scheme for approximately solving convex programs using online convex opti- 
mization algorithms. This is a generalization of previous methods which also allows us to derive 
our results stated in the previous section. 

For this section we consider the following general mathematical program, which generalizes (^Q) 
by allowing an arbitrary convex set V . 



In order to approximately solve (J3J), we reduce the mathematical problem to a game between 
two players: a 'primal player who tries to find a feasible point and the dual player who tries to 
disprove feasibility. This reduction is formalized in the following definition. 

Definition 1. The associated game with mathematical program is between a primal player that 
plays x £ V and a dual player which plays a distribution over the constraints p £ S m . For a point 
played by the primal player and a distribution of the dual player, the loss that the primal player 
incurs (and the payoff gained by the dual player) is given by the following function 



The value of this game is defined to be X* = m.m xe -p max pg g m g(x,p). Mathematical program is 
feasible iff X* < 0. 

By the above reduction, in order to check feasibility of mathematical program (J2J, it suffices 
to compute the value of the associated game A*. Notice that the game loss/payoff function g is 
smooth over the convex sets S m and V, linear with respect to p and convex with respect to x. For 
such functions, generalizations to the von Neumann minimax theorem, such as Sio58 3 imply that 



This suggests a natural approach to evaluate A*: simulate a repeated game between the primal 
and dual players such that in each iteration the game loss/payoff is determined according to the 
function g. In the simulation, the players play according to an online algorithm. 

3 A11 algorithms and theorems in this paper can be proved without relying on this minimax theorem. In fact, our 
results provide a new algorithmic proof of the generalized min-max theorem which is included in Appendix ITU 




(3) 



x £ V 



V x £ V ,p £ S m . g(x,p) = ^2pjfj(x) 
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The online algorithms we consider fall into the online convex optimization framework |Zin03| . 
in which there is a fixed convex compact feasible set V C W 1 and an arbitrary, unknown sequence 
of convex cost functions /i, /2, . . . : V — > R. The decision maker must make a sequence of decisions, 
where the i th decision is a selection of a point x t G V and there is a cost of ft(xt) on period t. 
However, x t is chosen with only the knowledge of the set V , previous points X\, ... ,Xt-i, and the 
previous functions fx, . . . , ft-i- The standard performance measure for online convex optimization 
algorithms is called regret which is defined as: 



We say that an algorithm A has low regret if Regret (A, T) = o(T). Later, we use to the procedure 
OnlineAlg, by which we refer to any low regret algorithm for this setting. 

Another crucial property of online convex optimization algorithms is their running time. The 
running time is the time it takes to produce the point xt 6 V given all prior game history. 

The running time of our approximate optimization algorithms will depend on these two param- 
eters of online game playing algorithms: regret and running time. In Appendix [0 we survey some 
of the known online convex optimization algorithms and their properties. 

We suggest three methods for approximating © using the approach outlined above. The first 
"meta algorithm" (it allows freedom in choice for the implementation of the online algorithm) is 
called PrimalGameOpt and depicted in figure ^ For this approach, the dual player is simulated 
by an optimal adversary: at iteration t it plays a dual strategy pt that achieves at least the game 
value A* (this reduces exactly to Separation Oracle). 

The implementation of the primal player is an online convex optimization algorithm with low 
regret, which we denote by OnlineAlg. This online convex optimization algorithm produces 
decisions which are points in the convex set V . The cost functions f\, f%, . . . : V — > R are determined 
by the dual player's distributions. At iteration t, if the distribution output by the dual player us 
Pt, then the cost function to the online player is 



The low-regret property of the online algorithm used ensures that in the long run, the average 
strategy of the primal player will converge to the optimal strategy. Hence the average loss will 
converge to A*. 

The "dual" version of this approach, in which the dual player is simulated by an online algorithm 
and the primal by an oracle, is called DualGameOpt. In this case, the adversarial implementation 
of the primal player reduces to Optimization Oracle. The dual player now plays according to an 
online algorithm OnlineAlg. This online algorithm produces points in the m-dimensional simplex 
- the set of all distributions over the constraints. The payoff functions are determined according to 
the decisions of the primal player: at iteration t, if primal player produced point xt S V , the payoff 
function is 



We also explore a third option, in which both players are implemented by online algorithms. 
This is called the PrimalDualGameOpt meta-algorithm. Pseudo-code for all versions is given 
in figure (0). 

The following theorem shows that all these approaches yield an e-approximate solution when 
the online convex optimization algorithm used to implement OnlineAlg has low regret. 




VxeP. ft(x) = g(x,p t ) 



Vp € §> m • ft(p) - g{xt,p) 
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PrimalGameOpt (e) 

Let t<—l. While Regret(ONLiNEALG,t) > el do 

• Let x t <— OnlineAlg (pi, ...,p t _i). 

• Let j <— Separation Oracle (a^). If FAIL return xt. Let pt <— &j, where is the j'th 
standard basis vector of R n . 

• i <- t + 1 

Return p = ^ Ya=i Pt 
DualGameOpt (e) 

Let t <— 1. While Regret(ONLiNEALG,t) > do 

• Let p t <— OnlineAlg (xi, cc t — 1). 

• Let x t <— Optimization Oracle (p t ). If FAIL return p t . 

• i-s- t + 1 

Return x = A X]fc=i x « 

PrimalDualGameOpt (e) 

Let t <— 1. While Regret(ONLiNEALG,t) > ft do 

• Let xt <— OnlineAlg (pi, ...,p t _i). 

• Let p 4 <— OnlineAlg (xi, x t _i). 

If x = ^ Ylt=i x t i s e- approximate return x. Else, return p = y Ylt=iPt- 

Figure 1: meta algorithms for approximate optimization by online game playing 

Theorem 7. Suppose OnlineAlg is an online convex optimization algorithm with low regret. If 
a solution to mathematical program |^ exists, then meta- algorithms PrimalGameOpt, Dual- 
GameOpt and PrimalDualGameOpt return an e-approximate solution. Otherwise, Primal- 
GameOpt and DualGameOpt return a dual solution proving that the mathematical program is 
infeasible, and PrimalDualGameOpt returns a dual solution proving the mathematical program 
to be e-close to being infeasible. 

Further, a e-approximate solution is returned in O(^) iterations, where R = R(OnlineAlg,e) 
is the smallest number T which satisfies the inequality Regret{OnlineAlg , T) < eT. 

Proof. Part 1: correctness of PrimalGameOpt 

If at iteration t Separation Oracle returns FAIL, then by definition of Separation Ora- 
cle, 

Vp* . g(x t ,p*) <e => Vj G [m] . fj(x t ) < e 
implying that xt is a e-approximate solution. 
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Otherwise, for every iteration g(xt,pt) > s, and we can construct a dual solution as follows. 
Since the online algorithm guarantees sub-linear regret, for some iteration T the regret will be 
R < eT. By definition of regret we have for any strategy x* G V, 

1 T 1 T R 1 T 

e < - ^g{x u pt) < ~Y,9(x*,Pt) + j; < -^g{x\p t )+e< g(x*,p) + e 

t=i t=i t=i 

Where the last inequality is by the concavity (linearity) of g{x,p) with respect to p Thus, 

Vx* . g(x*,p) > 

Hence p is a dual solution proving that the mathematical program is infeasible. 

Part 2: correctness of DualGameOpt The proof of this part is analogous to the first, and 
given in the full version of this paper. 

If for some iteration t Optimization Oracle returns FAIL. According to the definition of 
Optimization Oracle, 

VxeP. g(x, Pt ) > 

implying that pt is a dual solution proving the mathematical program to be infeasible. 

Else, in every iteration g(xt,pt) < 0. As before, for some iteration T the regret of the online 
algorithm will be R < eT. By definition of regret we have (note that this time the online player 
wants to maximize his payoff) 

T T T 

Vp* G P{F) . > ±2>(a*, ft ) > ^ J>(*t,P*) - § >^Y,9{xt,P*)-e 
t=i t=i t=i 

Changing sides and using the convexity of the function g(x,p) with respect to x (which follows 
from the convexity of the functions / G T) we obtain (for x = y Ylt=i x t) 

1 T 

Vp* G P{F) . g(x,p*) < -J2g(xt,P*) < e 

t=i 

Which in turn implies that 

V/ G T . f{x) < e 

Hence x is a e-approximate solution. 

Part 3: correctness of PrimalDualGameOpt 

Denote R\, R2 the regrets attained by both online algorithms respectively. Using the low regret 
properties of the online algorithms we obtain for any x*,p* 

T T T 

\fx*,p* . Y J 9(xt,P*)-Ri<Y J 9( x t,Pt)<Y,9(^,Pt)+R2 (5) 
t=i t=i t=i 

Let x* be such that Vp G P{F) ■ g(x*,p) < X*. By convexity of g(x,p) with respect to x, 

T T 

Vf ■ 9(x,p*) g(x t , P *) <±J2 a^^Pt) + < X* + e 

t=i t=i 
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Similarly, let p* be such that Vx G "P . g(x,p*) > A*. Then by concavity of g with respect to p and 
equation [3 we have 

T ^ T R R 

Vx* . ff(x*,p) > -^#(x*,p t ) > ^ — - > A* -e 

Hence, if A* < 0, then x satisfies 

\/p* . g(x,p*) < e =^ Vj G [m] . /j(x) < e 
And hence is a e- approximate solution. Else, 

Vx* . g(x*,p) > —e 

And p is a dual solution proving that the following mathematical program is infeasible. 

fj{x) < -e Vj € [m] 
xeV 

□ 

3 Applications 

3.1 Strictly convex programs 

We start with the easiest and perhaps most surprising application of Theorem [Tj Recall that the 
feasibility problem we are considering: 

fj(x) < Vj G [m] (6) 
x G S n 

Where the functions {/,-} are all strictly convex such that Vx G S n , j G [m] . V 2 /j(x) >z H ■ I n and 
l|V/,(x)|| 2 <G 

Proof of Theorem^ Consider the associated game with value 

A* = min max fj{x) 

x& n je[m] 

The convex problem is feasible iff A* < 0. To approximate A*, we apply the PrimalGameOpt 
meta algorithm. In this case, the vectors xt are points in the simplex, and pt are distributions over 
the constraints. The online algorithm used to implement OnlineAlg is Online Convex Gradient 
Descent (OCGD). The resulting algorithm is strikingly simple, as depicted in figure |^1 

According to Theorem 1 in HKKA06 , the regret of OCGD is bounded by Regret(T) = 
O(^jj-logT). Hence, the number of iterations till the regret drops to eT is According 
to Theorem this is the number of iterations required to obtain an e- approximation. 

In each iteration, the OCGD algorithm needs to update the current online strategy (the vector 
xt) according to the gradient and project onto S n . This requires a single gradient computation. A 
projection of a vector y G M n onto S n is defined to be Hp(y) = ar g mrn :ces n ll^ - The projection 
of a vector onto the simplex can be computed in time 0{n) (see procedure SimplexProject 
described in Appendix El). Other than the gradient computation and projection, the running time 
of OCGD is 0(n) per iteration. □ 
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StrictlyCovexOpt. 

Input: Instance in format ©, parameters G, H approximation guarantee e. 
Let t *— 1 , x\ <— ^1. 
While t < ^±log± do 

• Let j <— Separation Oracle (x^) (i.e. an index of a violated constraint). If all constraints 
are satisfied return xt- Else, let Vt-i = ^ fj( x t~i)- Let p t <— ej where ej is the j'th standard 
basis vector of R" . 

• Set y t = x t -i - ^V t -i 

• Set x 4 = SimplexProject (y t ). 

• i <- t + 1 

Return p = ± ^^Lj p t 



Figure 2: An approximation algorithm for strictly convex programs. Here 1 stands for the vector 
with one in all coordinates. 

Remark: It is clear that the above algorithm can be applied the more general version of convex 
program 0, where the simplex is replaced by an arbitrary convex set V C R n . The only change 
required is in the projection step. For Theorem EJ we assumed the underlying convex set is the 
simplex, hence the projection can be computed in time 0(n). Projections can be computed in 
linear time also for the hypercube and ball. For convex sets which are intersections of hyperplanes 
(or convex parabloids), computing a projection reduces to optimizing a convex quadratic function 
over linear (quadratic) constraints. These optimization problems allow for more efficient algorithms 
than general convex optimization |LVB L98 . 

As a concrete example of the application of Theorem |S1 consider the case of strictly convex 
quadratic programming. In this case, there are m constraint functions of the form fj(x) = x T AjX + 
bjx + c, where the matrices Aj are positive-definite. If Aj y H ■ I, and y x ^§ n \\AjX + bj\\2 < G, 

then Theorem 01 implies that an e-approximate solution can be found in O(j^) iterations. 

The implementation of Separation Oracle involves finding a constraint violated by more 
than e. In the worst case all constrains need be evaluated in time 0(mn 2 ). The gradient of 
any constraint can be computed in time 0(n 2 ). Overall, the time per Separation Oracle 
computation is 0(mn 2 ). We conclude that the total running time to obtain a e-approximation 
solution is 0( G ^™ ). Notice that the input size is mn 2 in this case. 

3.2 Linear and Convex Programs 

In this section we prove Theorem |SJ which gives an algorithm for convex programming that has 
running time proportional to ~. As a simple consequence we obtain corollary El for linear programs. 
The algorithm is derived using the PrimalGameOpt meta-algorithm and the Online Newton 
Step (ONS) online convex optimization algorithm (see appendix to implement OnlineAlg. 
The resulting algorithm is described in figure |21 below. 

Since for general convex programs the constraints are not strictly convex, one cannot apply 
online algorithms with logarithmic regret directly as in the previous subsection. Instead, we first 
perform a reduction to a mathematical program with exp-concave constraints, and then approxi- 
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CovexOpt. 

Input: Instance in format (J7J), parameters G,D,iv approximation guarantee e. 
Let t «- 1 , xi <- if , <- \ min{l, , A «- ^W In ' A o 1 «~ ^^n- 

While t < 6raGT>± log § do 

• Let j <— Separation Oracle {xtj (i.e. an index of a violated constraint). If all constraints 
are satisfied return x t . Else, let Vt-i = V{log(e + a; /j(x))} and <— ej where e 3 - is the 
j'th standard basis vector of W 1 . 

• Set y t = x t -i + ^^LiVt-i 

• Set x t = argmin^gp (y t - x) T A t -i(y t - x) 

. Set A t = A t . x + VhV^, and = ^ - ^T^^f 1 
Return p = ^ Pt 



Figure 3: An approximation algorithm for convex programs. Here I n stands for the n-dimensional 
identity matrix. 

mate the reduced instance. 

Proof of Theorem^ In this proof it is easier for us to consider concave constraints rather than 
convex. Mathematical program Q can be converted to the following by negating each constraint: 

fj{x) > Vjf G M (7) 

where the functions {fj} are all concave such that Vx E P,j G [m] . ||V/j(x)||2 < G and Vx G 
"P,j G [m] . |/i(x)| < u. This program is even more general than (J2) as it allows for an arbitrary 
convex set V rather than S n . 

Let p = max xg -p minjj/^x)}. The question to whether this convex program is feasible is 
equivalent to whether p > 0. 

In order to approximately solve this convex program, we consider a different concave mathe- 
matical program, 

log(e + ur7,(x))>l VjGfm] (8) 
x G V 

It is a standard fact that concavity is preserved for the composition of a non-decreasing concave 
function with another concave function, i.e. the logarithm of positive concave functions is itself 
concave. To solve this program we consider the (non-linear) zero sum game defined by the following 
min-max formulation 

A* = max min log(e + uj~ 1 fj(x)) (9) 

The following two claims show that program (jSJ is closely related to (|7j). 
Claim 8. A* = log(e + w _1 p). 
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Proof. Let £ be a solution to (J7J) which achieves the value p, that is Vj G [m] . fj(x) > p. This implies 
that Vj G [m] . log(e + uj~ 1 fj(x)) > log(e + and in particular Vq g(x,q) > log(e + 

hence A* > log(e + oj~ l p). 

For the other direction, suppose that A* = log(e + z) > log(e + uj~ l p) for some z > ui~ l p. 
Then there exists an x such that Vj G [m] . log(e + u)~ l fj{x)) > A* > log(e + z) or equivalently 
Vj £ [m] . fj(x) > z > p in contradiction to the definition of p. □ 

Claim 9. An e- approximate solution for © is a 3lve- approximate solution for Q. 

Proof. A e-approximate solution to (jSJ) satisfies Vj . log(e + uj^ 1 fj(x)) > A* — e = log(e + a; _1 p) — e. 
Therefore, by monotonicity of the logarithm we have 

U^fjix) > e ^s(e+^' 1 P )--e_ e 

= (e + oJ~ l p) ■ e~ £ — e 
> {e + uj- 1 p)(l - e) - e 
= uj~ 1 p(l — e) — ee 



Which implies 



fj(x) > p{l -£)- 3we 

□ 



We proceed to approximate A* using PrimalGameOpt and choose the Online Newton Step 
(ONS) algorithm (see appendix^ as OnlineAlg. The resulting algorithm is depicted in figureOl 

We note that here the primal player is maximizing payoff as opposed to the minimization version 
in the proof of Theorem [7| The maximization version of Theorem [7| can be proved analogously. 

In order to analyze the number of iterations required, we calculate some parameters of the 
constraints of formulation (jHJ) . See appendix [O for explanation on how the different parameters 
effect the regret and running time of Online Newton Step. 

The constraint functions are 1-exp-concave, since their exponents are linear functions. Their 
gradients are bounded by 



G = maxmax ||V logfe + uj 1 fj(x)) II = max max II — — V J J} ' \\ < ui 1 G 
j&m xev J jem xev e + uj- 1 fj(x) 

According to Theorem 2 in |HKKA 06 . the regret of ONS is 0{{± + GD)n\ogT). In our setting, 

a = 1 and G is replaced by G. Therefore, the regret becomes smaller then eT after 0( nGD J* 1 — ) 

iterations. By Theorem [3 after T = 0( nGL i w — ) iterations we obtain an <5-approximate solution, 
i.e a solution x* such that 

min log(e + ui^f^x*)) > A* - 5 
je[m] 

Which by claimElis a 3u;5-approximate solution to the original math program. Taking 5 = 0{uj~ 1 e) 
we obtain an e-approximate solution to concave program l|7|) in T = 0{ nGD ) iterations. 

We now analyze the running time per iteration. Each iteration requires a call to Separation 
Oracle in order to find an e-violated constraint. The gradient of the constraint need be computed. 
According to the gradient the ONS algorithm takes 0(n 2 ) time to update its internal data structures 
(which are yt^At^A^ 1 in figure OJ- Finally ONS computes a generalized projection onto V, which 
corresponds to computing argmin rg .p (y — x) T A t -i(y — x) given y (see appendix ID]) 

If V = S n , then D = 1 and the bounds of Theorem are met. 

□ 
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Given Theorem it is straightforward to derive corollary for linear programs: 

proof of Corollary For linear programs in format (J2J), the gradients of the constraints are bounded 
by maxj € [ m ] \\Aj\\ < 1. In addition, Separation Oracle is easy to implement in time 0(mn) by 
evaluating all constraints. 

Denote by T^ ro - the time to compute a generalized projection onto the simplex. A worst case 
bound isT^ roj = 0(n 3 ), using interior point methods (this is an instance quadratically constrained 
convex quadratic program, see . LVBL98] ) . 

Plugging these parameters into Theorem [SJ the total running time comes to 

d{- e -(nm + n 2 + T^. OJ )) 

□ 

Remark: As is the case for strictly convex programming, our framework actually provides a 
more general algorithm that requires a Separation Oracle. Given such an oracle, the corre- 
sponding optimization problem can be solved in time 0(j • (n 2 + TA, pro j + T orac i e )) where T orac i e is 
the running time of SEPARATION ORACLE. 

3.3 Derivation of previous results 

For completeness, we prove Theorem ^ using our framework. Even more generally, we prove the 
theorem for general convex program (J3J) rather than (^Q). 

Proof of Theorem^ Consider the associated game with value 

m 

X* = min max fAx) = max min > p,- fAx) 
xeVje[m] Jn ' P es m xev^; yu n 1 

The convex problem is feasible iff A* < 0. To approximate A*, we apply the DualGameOpt 
meta algorithm. The vectors xt are points in the convex set V, and pt are distributions over the 
constraints, i.e. points in the m dimensional simplex. The payoff functions for OnlineAlg in 
iteration t are of the form 

Xp . g(x t ,p) = ^Pifiixt) 

i 

The online algorithm used to implement OnlineAlg is the Multiplicative Weights algorithm 
(MW). According to Theorem ll3l in appendix|0 the regret of MW is bounded by Regret r (MW) = 
0{G O0 ^T log m) (the dimension of the online player is m in this case). Hence, the number of 
iterations till the regret drops to eT is O(-pr)- According to Theorem [7J this is the number of 
iterations required to obtain an e- approximation. 

To bound Goo, note that the payoff functions Xp . g(xt,p) are linear. Their gradients are Tri- 
dimensional vectors such that the i'th coordinate is the value of the z'th constraint on the point 
Xt, i.e. fi(xt). Thus, the norm of the gradients can be bounded by 

Goo = maxmax VfAp . g(xt,p)) < maxmax/ii) 
xeV te[T] ie[m] xev 

And the latter expression is bounded by the width u = maxj g j m ] max xe p Thus the number 

~ 2 

of iterations to obtain an e- approximate solution is bounded by O(pr). 

In each iteration, the MW algorithm needs to update the current online strategy (the vector 
pt) according to the gradient in time 0(m). This requires a single gradient computation. □ 
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A Lower bounds 

The algorithmic scheme described hereby generalizes previous approaches, which are generally 
known as Dantzig- Wolfe-type algorithms. These algorithms are characterized by the way the 
constraints of mathematical program are accessed: every iteration only a single Optimization 
Oracle call is allowed. 

For the special case in which the constraints are linear, there is a long line of work leading to 
tight lower bounds on the number of iterations required for algorithms within the Dantzig- Wolfe 
framework to provide an e-approximate solution. Already in 1977, Khachiyan proved an fi(^) lower 
bound on the number of iterations to achieve an error of e. This was tightened to O(Jr) by Klein 
and Young |KY99j . and independently by Freund and Schapire FS99 . Some parameters were 
tightened in AHK05aj. 

For the game theoretic framework we consider, it is particularly simple and intuitive to derive 
tight lower bounds. These lower bounds do not hold for the more general Dantzig- Wolfe frame- 
work. However, virtually all lagrangian-relaxation-type algorithms known can be derived from our 
framework. Thus, for all these algorithms lower bounds on the running time in terms of e can be 
derived from the following observation. 

In our setting, the number of iterations depends on the regret achievable by the online game 
playing algorithm which is deployed. Tight lower bounds are known on regret achievable by online 
algorithms. 

Lemma 10 (folklore). For linear payoff functions any online convex optimization algorithm incurs 
^(GooVT) regret. 

Proof. This can be seen by a simple randomized example. Consider V = [—1,1] and linear func- 
tions ft(x) = rtx, where rt = ±1 are chosen in advance, independently with equal probabil- 
ity. E n [ft(xt)] = for any t and xt chosen online, by independence of x% and rt- However, 
E ri; ,., trT [mm xeK ^2i ft( x )] = E HXa r t|] = -^{VT)- Multiplying r t by any constant (which 
corresponds to Goo) yields the result. □ 

The above simple lemma is essentially the reason why it took more than a decade to break the 
\ running time. The reason why we obtain algorithms with linear dependance on e is the use 
of strictly convex constraints (or, in case the original constraints are linear, apply a reduction to 
strictly convex constraints). 
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B A general min-max theorem 



In this section prove a generalized version of the von Neumann min-max theorem. The proof 
is algorithmic in nature, and differs from previous approaches which were based on fixed point 
theorems. 

Freund and Schapire |FS99 provide an algorithmic proof of the (standard) min-max theorem, 
and this proof is an extension of their ideas to the more general case. The additional generality is 
in two parameters: first, we allow more general underlying convex sets, whereas the standard min- 
max theorem deals with the n-dimensional simplex S n . Second, we allow convex-concave functions 
as defined below rather than linear functions. Both generalities stems from the fact that we use 
general online convex optimization algorithms as the strategy for the two players, rather than 
specific "expert-type" algorithms which Freund and Schapire use. Other than this difference, the 
proof itself follows FS99 almost exactly. 

The original minimax theorem can be stated as follows. 

Theorem 11 (von Neumann). If X,Y are finite dimensional simplices and f is a bilinear 
function on X x Y , then f has a saddle point, i.e. 

minmax/(x,y) = maxmin/(x,y) 

x£X y&Y y&f x£X 

Here we consider a more general setting, in which the two sets X, Y can be arbitrary closed, 
non-empty, bounded and convex sets in Euclidian space and the function / is convex-concave as 
defined by: 

Definition 2. A function f on XxY is convex- concave if for every y E Y the function\/x E X f y (x) = 
f{x,y) is convex on X and for every x E X the functionary E Y f x (y) — f{x,y) is concave on Y. 

Theorem 12. IfX, Y are closed non-empty bounded convex sets and f is a convex- concave function 
on X xY, then f has a saddle point, i.e. 

maxmin/(x,y) = minmax/(x,y) 

Proof. Let \x* = max^gy min^gx f(%, y) and A* = mm x& x max^gy f(x, y). Obviously //* < A* (this 
is called weak duality). 

Apply the algorithm PrimalDualGameOpt with any low-regret online convex optimization 
algorithm. 4 Then by the regret guarantees we have for the first algorithm (let y = ^ Y2t=i Vt) 

1 T T 

t=i 

< min x gx f(x, y) + ^ concavity of f x 

< max^gy min^gx f(x, y) + ^ 

4 for a low-regret algorithm to exist, we need / to be convex-concave and the underlying sets X, Y to be convex, 
nonempty, closed and bounded. 
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Similarly for the second online algorithm we have (let x = ^ Ylt=i x t) 

1 T T 
j;^2f{xt,yt) > m.ax yeY tT,J=i f(x t ,y) - ^ 

t=i 

> min^gx f(x, y) + ^ convexity of f y 

> rmn xeX max^y f(x, y) + ^ 

A* + ^ 

Combining both observations we obtain 

^ * "-2 ^ * R\ 
A — -f < « + — - 
T ~ T 

As T i— ► oo we obtain /u* > A*. 

□ 

C Online convex optimization algorithms 

Figure (j3J) summarizes several known low regret algorithms. The running time is the time it takes 
to produce the point x% € V given all prior game history. 



Algorithm 


Regret bound 


running time 


Online convex gradient descent 


§ log(T) 


0(n + T proj ) 


Online Newton step 


(± + G 2 D)n\ogT 


0(n 2 + T A , P roj) 


Exponentially weighted online opt. 


^log(T) 


poly{n) 


Multiplicative Weights 


Goo^T logn 


Q(n) 



Figure 4: Various online convex optimization algorithms and their performance. T pro j is the 
time to project a vector y £ W 1 to V, i.e. to compute argmin xg -p \\y — x\\z- TA !pro j is the time 
to project a vector y £ R n to V using the norm defined by PSD matrix A, i.e. to compute 
argmin a;e -p(?/ - x) T A(y - x). 

The first three algorithms are from [HKKA06 and are applicable to the general online convex 
optimization framework. The description and analysis of these algorithms is beyond our scope, and 
the reader is referred to the paper. 

The last algorithm is based on the ubiquitous Multiplicative Weights Update method (for more 
applications of the method see survey AHK05a ), and is provided below. Although it was used 
many times for various applications (for very detailed analysis in similar settings see KW97 ), this 
application to general online convex optimization over the simplex seems to be new (Freund and 
Schapire FS99 analyze this algorithm exactly, although for linear payoff functions rather than for 
general convex functions). 

This online algorithm, which is called "exponentiated gradient" in the machine learning litera- 
ture, attains similar performance guarantees to the "online gradient descent" algorithm of Zinkevich 
Zin03,. Despite being less general than Zinkevich's algorithm (we only give an application to the 
n-dimensional simplex, whereas online gradient descent can be applied over any convex set in 
Euclidian space), it attains somewhat better performance as given in the following theorem. 



18 



Multiplicative Weights. 

Inputs: parameter r/ < |. 

• On period 1, play the uniform distribution x\ = 1 € S n . Let Vi 6 [n] . = 1 

• On period t, update 

i4 = < 1 -(l + ? ?-Vt-i(i)) 



where Vj = Vft(xt), and play defined as 



I to* ||i 



Figure 5: The Multiplicative Weights algorithm for online convex optimization over the simplex 



Theorem 13. The Multiplicative Weights algorithm achieves the following guarantee, for allT > 1. 

T T 

Regret(MW, T) = V f t (x t ) - min V f t (x) < OiG^y^lVf) 
Proof. Define = Ei^i- Since v~^t{i) G [0,1], 



i+l 



E-I +1 = E^(i-7?-v^)) 



G 



= «*_^^ aJt(f )v t (i 

= - r/xtVt/Goo) 



since ^t(i) = -u;*/<3?* 



After T rounds, we have 



ne 



since 1 — x < e x for \x\ < 1 



-vJ2t XtVt/Ga 



(10) 



Also, for every i£ [n], using the following facts which follow immediately from the convexity of the 
exponential function 

(1 - vT < (1 - fix) if x G [0, 1] 
(1 + r/)" a; < (1 - m) if a; €[-1,0] 

We have 

t 

= H(l-r ] V t (i)/G 00 ) 

t 

> (1 - 7y)Et>o VtW/Goo q + ^E«o -V t (i)/Goo 

where the subscripts > and < refer to the rounds t where Vt(i) is > and < respectively. So 
together with ((TU|) 
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ne -vT,t x tVt/G x > (1 _ ^Eoo Vt(i)/G» ^ + ^^ko - Vi(«)/Ga, 

Taking logarithms and using ln( jz^j) < r] + rj 2 and ln(l + rj) > rj — rj 2 for i] < | we get for all 
i G [n] and x* G S n 



J2 x t v t < (i + „) £ + (1 - r,) £ V t (i) + < £ x*V, + r? 2 z*|V t | + 

t >o <o ^ t t ^ 

Where we denote | Vt | for the vector that has in coordinate % the value | V* {%) \ ■ Therefore 

Y,ft(*t)-ft(x*)<Y,Vt(x t -x*) 
t t 

^ Goo log n 

t 71 
77 



And the proof follows choosing 77 = y -^p^ 

□ 

Remark: As the algorithm is phrased, it needs to know T and in advance (this is not a 
problem for the way we use online algorithms as a building block in approximate optimization). 
Standard techniques can be used so that the algorithm need not accept any input: the dependence 
on T can be removed by doubling the value of T as it is being exceeded. The dependence on Goo can 
be removed by using, at any point in the algorithm application, the largest Goo value encountered 
thus far. 



D Projections onto convex sets 

Many of the algorithms for online convex optimization described in this chapter require to com- 
pute projections onto the underlying convex set. This correspond to the following computational 
problem: given a convex set V C R n , and a point y G M. n , find the point in the convex set which is 
closest in Euclidian distance to the given vector. We denote the latter by Hp[y]. 

This problem can be formulated as a convex program, and thus solved in polynomial time by 
interior point methods or the ellipsoid method. However, for many simple convex bodies which 
arise in practical applications (some of which will be detailed in following chapters), projections 
can be computed much more efficiently. For the n-dimensional unit sphere, cube and the simplex 
these projections can be computed combinatorially in 0(n) time, rendering the online algorithms 
much more efficient when applied to these convex bodies. 



The unit sphere 

by i n = {ier, 



The simplex projection is over the unit n-dimensional sphere, which we denote 
1Mb < !}• Given a vector y G R ra , it is easy to verify that it's projection is 



y 



\y\\ < 1 



\\v\\ 1 
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The unit cube Another body which is easy to project onto is the unit n-dimensional cube, which 
we denote by = {i £ R n , [|»[|oo < 1} (i- e - each coordinate is less than or equal to one). Given 
a vector y £ M. n , it is easy to verify that it's projection is 



Vie [n] . U v [y](i) 



y[i\ y(i)€[-l,l] 
1 y(i) > 1 
-1 y(i)<-l 



The Simplex The first non-trivial projection we encounter is over the n-dimensional simplex. 
The simplex is the set of all n-dimensional distributions, and hence is particularly interesting in 
many real-world problems, portfolio management and haplotype frequency estimation just to name 
a few. Surprisingly, given an arbitrary vector in Euclidian space, the closest distribution can be 
found in near linear time. A procedure for computing such a projection is given in figure H3 

SimplexProject (y). 

Suppose w.l.o.g that y\ < yi-.- < y n (otherwise sort indices of y). 

• Let a£lbe the number such that Y^i=i max {y« — a > 0} = 1- Set 
Vi E [n] . Xi = max{j/i — a, 0}. 

• Return x 



Figure 6: A Procedure for projecting onto the Simplex 



Lemma 14. SimplexProject (y) is the projection of y £ M. n to the n-dimensional simplex, and 
can be computed in time 0(n). 

Proof. First, note that the number a computed in SimplexProject exists and is unique. This 
follows since the function f(a) = J27=i max {y« ~~ a >0} is continuous, monotone decreasing, and 
takes values in [0, oo). 

Next, the vector returned x = SimplexProject(y) is in the simplex. All its coordinates are 
positive by definition, and Ya=i Xi = Ya=i max {Vi ~ a i 0} = 1- 

To show that x is indeed the projection we need to prove that it is the optimum of the mathe- 
matical program 

n 

min — Xi) 2 

It suffices to show that x is a local optimum, since the program is convex. Let Cj = — x,. 
Then the values {c{\ are decreasing and of the form 

(ci, ...,c n ) = (a, ...,a,y k , ...,y n ) 

An allowed local change is of the form x\ <— X{ — e and x'j <— Xj+e for i < j, since all coordinates 
larger than k have Xk = 0. This would cause a change in the objective of the form 

d 

^{Vt ~ xi) 2 - (yi - x'i) 2 = a 2 - (a + e) 2 + Cj - { Cj - e) 2 = -2(a - Cj)e - 2e 2 < 



21 



Hence would only reduce the objective. Therefore x is indeed the projection of y. 
The procedure SimplexProject requires sorting n elements, and finding the value a, which is 
standard to implement in 0(n log n) = 0(n) time. 

□ 

E Examples of strictly convex mathematical programs 

In this section we give some examples of problems which arise in practice and contain strictly 
convex constraints. The first example henceforth, and many others, appear in the excellent survey 
of |LVBL98| . 

E.l Portfolio optimization with loss risk constraints 

A classical portfolio problem described in LVBL98 is to maximize the return of a portfolio over n 
assets under constraints which limit its risk. The underlying model assumes a gaussian distribution 
of the asset prices with known n-dimensional mean and covariance matrix. 

The constraints bound the probability of the portfolio to achieve a certain return under the 
model. A feasibility version, of just checking whether a portfolio exists that attains certain risk 
with different mean-covariance parameters, can be written as the following mathematical program 

pj x — (3 ■ x T T,j x > a Vjf £ [m] (11) 

x € § n 

We refer the reader to LVBL98 section 3.4 for more details. 

If the underlying gaussian distributions are not degenerate, the covariance matrices 'Sj are 
positive definite. If the covariance matrices are degenerate - there is a linear dependance between 
two or more assets. In this case it is sufficient to consider a smaller portfolio with only one of the 
assets. 

The non-degeneracy translates to a strictly positive constant H > such that Vj G [m] . y 
H ■ I. This is, of course, the smallest eigenvalue of the covariance matrices. 

E.2 Computing the best CRP in hindsight with transaction costs normalization 

In a popular model for portfolio management (see |Cov911 IH5SW96 ) the market is represented by 
a set of price relative vectors r%, ...,tt S R!L These vectors represent the daily change in price for 
a set of n assets. A Constant Rebalanced Portfolio is an investment strategy that redistributes the 
wealth daily according to a fixed distribution p E S n . A natural investment strategy computes the 
best CRP up to a certain trading day and invests according to this distribution in the upcoming 
day. 

On this basic mathematical program many variants have been proposed. In AH05 , a logarith- 
mic barrier function is added to the objective, which enables to prove theoretical bounds on the 
performance. Bertsimas |Berflfij suggested to add a quadratic term to the objective function so to 
take into account transaction costs. An example of a convex program to find the best CRP, subject 
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to transaction costs constraints is 

T n 

max ^2 log(p T r 4 ) + ^ log(p T e;) (12) 
t=i i=i 

Wp-pWI < c 

p G §„ 

The objective function includes the logarithmic barrier of I A H05| . the vectors {e^} are the standard 
basis unit vectors. The constraint enforces small distance to the current distribution p to ensure 
low transaction costs. 

The Hessian of the objective is 

T 1 n 1 

^ (P 1 n) 2 ^ p- 

The Hessian of the constraint is the identity matrix. Hence the constant H for Theorem El is one. 

E.3 Maximum entropy distributions for with £2 regularization 

The following mathematical program arises in problems concerning frequency estimation from a 
given sample. Examples include modelling of species distributions DS06 and haplotype frequency 
estimation HH06 . 

min H(p) (13) 

l|A(p-p)||2 < c * € [m] 
p G § n 

Where H : M. n 1— ► R is the negative of the entropy function, defined by H(p) = Y27=iPi l°SPi- 
The Hessian of // is V 2 H(p) = diag(^), i.e. the diagonal matrix with entries {^r,i G [?"&]} on the 
diagonal. Hence V 2 H(p) >z I- 

The hessian of the i'th constraints is AiAj . For applications with min, AiAj >z c ■ I, the 
constant H for Theorem |3] is i/ = min{l, c}. 

F Proof of Corollary |U 

proof of Corollary^ Given mathematical program (^Q), we consider the following program 

/,•(*) + aiMl! -<5<o vje[ m ] (14) 

x G S n 

This mathematical program has strictly convex constraints, as 

Vi G [m] . V 2 (/i(a;) + <5||a:||l - 5) = V 2 /;(x) + 25/ ^ 25/ 

Where the last inequality follows from our assumption that all constraints in Q are convex and 
hence have positive semi-definite Hessian. Hence, to apply Theorem we can use H = 25. In 
addition, by the triangle inequality the gradients of the constraints of (fTi|) satisfy 

||V(/i(x) + 5|M| 2 - <5)|| 2 < ||V/i(x)|| 2 + 25 < G + 25 = 0(G) 



23 



Where G is the upper bound on the norm of the gradients of the constraints of l|T]). Therefore, 
Theorem |21 implies that a E-approximate solution to (|14j) can be computed in O(^) iterations, 
each requiring a single gradient computation and additional O(n) time. 

Notice that if (^Q) is feasible, i.e there exists x* £ S n such that minj g r m i fi(x*) < 0, then so is 
(|T4"|) since the same x* satisfies min ig r m i fi(x*) + <5||x||2 — S < S\\x\\2 — 5 < 0. 
Given a e-approximate solution to (|14|) . denoted y, it satisfies 

Vj G [m] . /,(y) + %||| - 5 < e ^ /,(y) < -<5||y||| + 5 + e < 5 + e 

Hence y is also a {e + 5)-approximate solution to (0). 

Choosing 5 = e, we conclude that a 2e- approximate solution to Q can be computed in O(^r) 
iterations. □ 
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